The sloppy model universality class and the Vandermonde matrix 
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In a variety of contexts, physicists study complex, nonlinear models with many unknown or 
tunable parameters to explain experimental data. We explain why such systems so often are sloppy; 
the system behavior depends only on a few 'stiff combinations of the parameters and is unchanged as 
other 'sloppy' parameter combinations vary by orders of magnitude. We contrast examples of sloppy 
models (from systems biology, variational quantum Monte Carlo, and common data fitting) with 
systems which are not sloppy (multidimensional linear regression, random matrix ensembles). We 
observe that the eigenvalue spectra for the sensitivity of sloppy models have a striking, characteristic 
form, with a density of logarithms of eigenvalues which is roughly constant over a large range. We 
suggest that the common features of sloppy models indicate that they may belong to a common 
universality class. In particular, we motivate focusing on a Vandermonde ensemble of multiparameter 
nonlinear models and show in one limit that they exhibit the universal features of sloppy models. 

PACS numbers: 02.30.Zz, 05.10.-a, 87.15. Aa, 87.16.Xa, 89.75.-k 



Systems with many parameters are often sloppy. For 
practical purposes their behavior depends only on a few 
stiffly constrained combinations of the parameters; other 
directions in parameter space can change by orders of 
magnitude without significantly changing the behavior. 
Given a suitable cost C (p) measuring the change in sys- 
tem behavior as the parameters p vary from their orig- 
inal values p(°^ (e.g., a sum of squared residuals), the 
stiff and sloppy directions can be quantified as eigenval- 
ues and eigenvectors of the Hessian of the cost: Hij = 
d 2 C/dp l dp J \ p{0) . 

Figure ^ shows the eigenvalues of the cost Hessian for 
many different systems; those in (a), (b), (c), (d) and 
(h) are all sloppy. The sensitivity of model behavior to 
changes along an eigenvector is given by the square root 
of the eigenvalue — therefore the range in eigenvalues of 
roughly one million for the sloppy models means that 
one must change parameters along the sloppiest eigendi- 
rcction a thousand times more than along the stiffest 
eigendirection in order to change the behavior by the 
same amount. Although anharmonic effects rapidly be- 
come important along sloppy eigendirections, a principal 
component analysis of a Monte-Carlo sampling of low- 
cost states has a similar spectrum of eigenvalues ; the 
sloppy eigendirections become curved sloppy manifolds 
in parameter space. Similar sloppy behavior has been 
demonstrated in fourteen systems biology models taken 
from the literature 0, Q > and in three multiparameter 
interatomic potentials fit to electronic structure data Q . 
In these disparate models we see a common, peculiar be- 
havior: the nth stiffest eigendirection is more important 
than the (n+l)th by a roughly constant factor, giving a 
total range of eigenvalues of typically over a million for 



any model with more than eight parameters. We call sys- 
tems exhibiting these characteristic features sloppy mod- 
els. 

This sloppiness has a number of important implica- 
tions. In estimating prediction errors, sloppiness affects 
both the estimation of statistical errors due to uncertain- 
ties in the experimental data 0, Q and allows an estima- 
tion of systematic errors due to imperfections in the mod- 
els (for example in interatomic potentials T] and density 
functional theory Jjj). It makes extracting parameter val- 
ues from fits to sloppy models ill-posed |2|,|8|. Conversely, 
it is much more efficient to improve the predictivity of a 
model by fitting parameters to system behavior than by 
designing experiments that precisely determine the indi- 
vidual parameter values Sloppy problems are also 
better approached with optimization algorithms 0, 
(like the Levenberg-Marquardt and Nelder-Mead meth- 
ods) which can adapt to widely diverging step sizes along 
different parameter combinations. 

Let us begin with the famously ill posed problem of 
fitting a sum of exponentials to data 0, Con- 
sider a mixture of equal amounts of N radioactive el- 
ements whose decay signal is thus the sum of N ex- 
ponentials with decay rates (7i , • • • ,7jv )• We de- 
fine a cost function for general decay rates as C — 

\ JcTEili ex P(~7*) -EiLi exp(- 7l (0) i)) 2 dlogt (spac- 
ing the 'data points' equally in log time makes analyz- 
ing large ranges of decay constants convenient). Because 
the decay constants are positive and can have a large 
range of sizes, we use their logarithms as our param- 
eters (pi — log7i), giving model sensitivity to relative 
changes in the decay rates. The resulting Hessian is 
Hij\ pm = 2 7l (0) 7j (0) /(7 I (0) + if) 2 - For the twelve ra- 
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FIG. 1: Eigenvalues giving the stiffness/sloppiness of various 
models as parameters are varied. Each spectrum has been 
shifted so that the largest eigenvalue is one. (a) Growth factor 
signaling model (coupled nonlinear ODEs) for PC12 cells 0, 
as the 48 parameters (rate and Michaels-Menten constants) 
are varied, (b) Variational wave-function used in quantum 
Monte- Carlo, as the Jastrow parameters (for electron-electron 
coincidence cusps) are varied, (c) Radioactivity time evolution 
for a mixture of twelve common radionuclides as the half- 
lives 7i are varied. The radionuclides are those available from 
Perkin-Elmer |j| with half-lives less than 100 days. (Only the 
first nine eigenvalues are shown.) (d) The same exponential 
decay model as in (c) with 48 decay constants 7, randomly 
spread over a range of e 50 . (e) One random 48x48 matrix 
in the Gaussian Orthogonal Ensemble (GOE) (not sloppy). 

(f) A product of five random 48x48 matrices, illustrating the 
random product ensemble (not sloppy, but ill conditioned). 

(g) A plane in 48 dimensions fit to 68 data points, the same 
number and data points as for the biology model in column (a) 
(Wishart statistics, not sloppy), (h) A polynomial fit to data, 
as the 48 monomial coefficients are varied (the Hilbert ma- 
trix 0, sloppy). 



dionuclides described in the caption to Figure G](c), the 
eigenvalues of the Hessian are each separated by nearly 
one decade; the sloppiest mode has an eigenvalue a fac- 
tor of 10 10 smaller (less important) than the stiffcst. This 
is not the result of an inaccurate mathematical descrip- 
tion, it is true for the correct model and parameters with 
a complete complement of data. The origin of sloppiness 
is not a simple lack of data where trivial overparameter- 
ization leads to unidentifiable parameters. 

Unless the individual lifetimes are well separated, the 
net radiation cannot be used to measure the lifetimes 
reliably. The difficulty is that the signal is the sum of 
many functions with similar shapes; one can generate 
almost identical signals with wildly different values for 
the parameters. Similarly, the sloppiness in more realistic 
models is presumably due to the compensation of subsets 
of parameters with similar effects. If we pick 48 lifetimes 
whose logarithms are instead uniformly distributed over 



a range of 2e = 50 (largest/smallest w e 50 « 10 21 ), the 
density of levels and the variation in spacings between 
neighboring levels in the new spectrum (Figure ^d)) is 
similar to that of the real-life models in^a) and (b). 

While a large number of models are sloppy, not all 
multiparameter models share this quality. The simplest 
form of multiple linear regression, which is in essence fit- 
ting a plane through the origin to a cloud of points, is 
not sloppy. The Hessian matrix for this type of model 
is the sample covariance matrix of the data points and 
is known as a Wishart matrix The eigenvalues of 

a Wishart matrix are described by the Marcenko-Pastur 
distribution Q and an example is seen in Figure Ug). 
The classic ensembles of Random Matrix Theory [l5j, [l6( 
(Figure He)) have uniform eigenvalue densities instead 
of the exponentially large range characteristic of sloppy 
systems. The ensemble of products of random matri- 
ces ^3 (FigureQIf)) does mimic the exponential spacing 
of (singular) values but in this case the variance of level 
spacings is proportional to the mean spacing. Toward 
the end of this paper we will see that for sloppy mod- 
els, in the limit of large spacings, the variance is instead 
independent of the mean. 

Why are so many models sloppy? We can gain insight 
by considering fitting data for x £ [0, 1] with polyno- 
mials. If one considers the polynomials of order N to 
be sums of monomials, y(x,p) = ^2iLoPi% 1 , the Hes- 
sian is Ha = 2 Am = -. , 2 ,_ 1 , the famously ill-conditioned 



HJ — _^ iv — i+j+i 
Hilbert matrix (Figure |TJh)). Indeed, the coefficients of 
the monomials are known to be poorly determined in 
such polynomial fits @ . Suppose we instead generate the 
same polynomial fit, but parameterize our polynomial as 
a sum of the appropriate shifted Legendre polynomials 
y(x,p') = J2iLoPi L i( x )'i L o = l,Li = s/3(2x - 1), L 2 = 
\/5(6x 2 — 6x+ 1), .... The shifted Legendre polynomials 
are orthonormal in the L 2 norm on [0,1], and the Hessian 
in the p' basis is the identity matrix. By changing our 
parameterization from monomial coefficients p to coeffi- 
cients p' in the appropriate orthonormal basis, our slop- 
piness is completely cured. The sloppiness is due to the 
fact that the monomial coefficients (natural from many 
perspectives) are a perverse set of coordinates from the 
point of view of the behavior of the resulting polynomial. 
We can quantify this by noting that the transformation 
Sn from the monomial basis to the orthonormal basis 
(the coefficients of the shifted Legendre polynomials) has 
a tiny determinant, and therefore the volume enclosed by 
the monomial basis vectors shrivels and becomes greatly 
distorted under the transformation. This determinant 
can be found by noting that Sn gives a Cholesky de- 
composition of the Hilbert matrix An — SjfSpj , and 

thus das N = vdciA^ = (u^mv^Uj^ m 

. Physically, the monomials all have roughly the same 
shape (starting flat near zero, and rising sharply at the 
end near one), and can be exchanged for one another, 
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while the orthogonal polynomials all have quite distinct 
shapes. In nonlinear sloppy models the sloppiness is more 
difficult to remove: (a) the transformation to unsloppy 
parameters will be nonlinear away from the optimum, 
often not even single- valued, (b) we may not have the in- 
sight or the ability to change parameterizations to those 
natural for fitting purposes, and (c) often the natural 
parameterization is determined by the science (as in bio- 
chemical rate constants, arbitrary linear combinations of 
which arc not biologically motivated). 

What causes this even distribution of relative stiff- 
nesses over so many decades of scales? To form strong 
conclusions about sloppy models we must establish crite- 
ria sufficient to exclude the large variety of multiparam- 
eter systems that will not be sloppy. First, we specialize 
to models where the cost is a sum of squared residu- 
als C(p) = X^ m r m' wnere the sum may be continuous 
(e.g., an integral over time) and r m — y rn (p) — d m is 
the deviation of theory y(p) from the experimental da- 
tum d rn ; all of our examples of sloppy models are of this 
type. Second, to avoid including systems where each pa- 
rameter is the subject of a separate experiment isolat- 
ing that component, we make the (strong) assumption 
that all of the residuals r m (p) depend on the parame- 
ters p in a symmetric fashion (e.g., permuting p leaves 
r m unchanged). This allows us to recast the residuals 
into the basis of power sum polynomials of the param- 
eters, r m (/Xi,/x 2 ,--.))Mfe = DjIiPjS which can also be 
viewed as the moments of the parameter distribution. 
Third, we noticed that in fitting exponentials the com- 
pensable nature of different parameters increased when 
they were restricted to smaller ranges; here we will as- 
sume that the parameters are all confined to a small range 
Pi G [p ± e] . Thus if we define ti = Pi — p, the residuals 
^(^1,^2: ■ ■ • ) can be written as functions of the mo- 
ments Hk = J2i=i e i- 

In general the Hessian is 

„ _^(d7^drrn d 2 r m \ 

to 9pj +rm d Pidpj J {1) 

but for the correct model at the true parameters the cost 
is zero, so r m = Vm and H = J T J with the Jacobian 

, _ dr m _ \ ^ dr m k _ 1 _ 

Jmi ~ ~dp~ ~ ^ Wk j ~ 3 { ) 

where 4 m j- = jp^k, Vkj = e? _1 , and K is the maxi- 
mum degree (possibly oo) to which we expand in e. Thus 
H = J T J = V T A T AV. Here V, the famous Vander- 
monde matrix, is the heart of the sloppy model univer- 
sality class. Reminiscent of random matrix theory en- 
sembles, we are now interested in the Vandermonde en- 
semble of Hessians of the form V T A T AV. The Vander- 
monde matrix is well-known primarily because its de- 
terminant (for N — K) can be expressed analytically, 



det(V) = rii<j( e i — € j)- As e ^ this product is tiny, 
det(V) = 0(e N ^ N ^/ 2 ). While the elements of A do, 
in general, depend on the parameter values, they cither 
approach a constant or zero in this limit and we can see 
that the determinant of H, det(H) = det(V) 2 det(A) 2 
is smaller still. As we saw with the Hilbert matrix and 
fitting monomials to data, transformation matrices with 
very small determinants are a signature of sloppy models. 

To show that the eigenvalues in our Vandermonde en- 
semble are evenly spread in logarithm, we will make use 
of an apparent truth about matrices: 

Conjecture 1 Let S <E R nxn be symmetric and positive 
definite. Let E £ R nx ™ be diagonal with En — e 1 ^ 1 and 
< e <C 1. Then the rath largest eigenvalue of ESE is 
0(e 2 ^" 1 ^ 1 ^) (less than some constant times e 2 ^™^ 1 ) ). 
We have two reasons to believe this conjecture is true. 
(1) Treating the off-diagonal components of ESE as a 
perturbation, the corrections to the mth eigenvalue are 
of order e 2 ( m_1 ) to all orders in perturbation theory, de- 
spite the fact that many of the perturbing elements are 
large compared to the diagonal entries. (2) Extensive nu- 
merical tests show an even sharper result: the mth largest 
eigenvalue, A m , is bounded above by the mth largest row 
sum of EES for all e, where the row sum for row k is 
y] ; e 2 ( fc - 1 )|5 fci |. This implies that A m < HS^e 2 ^- 1 ), 
and also (for e = 1) implies the remarkable apparent 
fact that the sorted eigenvalues of any symmetric positive 
definite matrix are each bounded by their corresponding 
sorted row sums. □ 

Motivated by numerical evidence that to leading order 
in e the eigenvectors of the Hessian are the right singular 
vectors of the Vandermonde matrix, we shall transform 
into that basis. We first bound the singular values of the 
Vandermonde matrix. Conveniently, VV T has the form 
necessary for Conjecture ^ The singular values of V 
are the positive square root of the eigenvalues of VV T . 
Factoring the appropriate power of e from each row of 
the Vandermonde matrix gives V = EX and VV T — 
EXX T E where E is the same as in Conjecture H an d 
the elements of X are bounded by one. Equating XX T 
with the matrix S in Conjecture ^ we conclude that the 
eigenvalues of VV T scale as X m (VV T ) = 0(e 2( - m ~^) and 
thus a m (V) = 0{e m ~ l ). 

We now transform the Hessian into this basis, and 
again use Conjecture Q to bound its eigenvalues. Start- 
ing with the decomposition H = V T A T AV, taking 
the singular value decomposition of V = C/ETV T , and 
transforming the Hessian into the basis of the right 
singular vectors of the V, we have W T HW = H = 
S T C/ T A T AC/S. We know that £« = ©(e^ 1 ). By con- 
struction the elements of A are well-behaved as e — ► and 
since U is an orthogonal matrix its elements too cannot 
diverge in this limit. This means that Hij — 0{e %Jr ^~ 2 ). 
By Conjecture [2 we know that Aj(-ff) = ©(e 2 ^ 1 )) and 
since H is simply an orthogonal transformation of H, 
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Xi(H) = C(e 2 ^ -1 )). Rigorous universality is only ex- 
pected as the system size approaches infinity. Empiri- 
cally we find, from studying a variety of models 0, Q, 
as well as subsystems of models (like PC12 in Figure Q] 
(a)) 0|> that models with more than roughly eight pa- 
rameters are often recognizably sloppy. 

Do these results tell us anything about the statistics 
of level spacings? Unless two parameters are strictly de- 
generate or the residuals are independent of a particular 
moment of the parameter distribution, Aj = ^e 2 ' 4-1 ' for 
some non-zero coefficient ij. The relative spacing be- 
tween neighboring eigenvalues is s, = log(Aj/Aj+i) = 
log(Zj/7j+i) — 21oge. For a fixed model but an ensemble 
of random parameters, the distribution of coefficients U 
has a finite width as e — > 0. Therefore the distribution of 
Si over the ensemble, normalized by 2 log e such that the 
average spacing is unity, goes to one with a width which 
vanishes as e — * 0. This means that the whole system is 
becoming not only more sloppy (larger spacing) but it is 
becoming almost deterministically so (strong level repul- 
sion). Figure |U(c) is a clear depiction of this remarkably 
strong level repulsion. 

What is the link between the Vandermonde ensemble 
at small e and the behavior of real world sloppy models 
(Figurencolumns (a), (b)) and also the behavior at large 
e (column (d))? These systems share the roughly uniform 
density of log-eigenvalues over many decades that is the 
signature of sloppy models but do not exhibit strong level 
repulsion. The real world models also do not share the 
strict requirement that the residuals be perfectly sym- 
metric functions of the parameters. We conjecture that 
while not all of the parameters are interchangeable in real 
world sloppy models, there are Vandermonde subsystems 
lurking below the surface. Thus the fastest decay rates 
in column (d) constitute one Vandermonde subsystem 
and the slowest decay rates another. Indeed, the Poisson 
statistics of level spacings when fitting exponential decays 
from a wide range (e.g. 2e = 50 as in (d)) can be repro- 
duced by superimposing the spectra of several separate 
experiments, each fitting decays from a narrower range 
(e.g. 2e = 3.5 as in (c)). Such a decomposition into Van- 
dermonde subsystems is also illustrated by modifying the 
net radiation model to include the initial amounts of the 
elements as unknown parameters. Now the parameters 
clearly separate into two classes - decay rates and initial 
amounts. Each class alone fits the assumptions of the 
Vandermonde ensemble, produces rigidly (strong level re- 
pulsion) sloppy spectra, and generates nearly equivalent 
patterns of changes in the residuals. When mixed to- 
gether however, the fact that parameters from one class 
can not compensate for parameters of the other class de- 
stroys the correlations between levels and they do not 
repel each other anymore. Similarly, a full many body 
wave function in quantum Monte Carlo [19j decomposes 
into the sloppy space of the Jastrow parameters in fig- 
ure^(b) and a non-sloppy subspace of the Configuration 



Interaction coefficients describing single-particle orbitals. 

These results motivate algorithms for the decomposi- 
tion of real world sloppy models into rigidly sloppy Van- 
dermonde subspaces whose components are effectively re- 
dundant. Such a decomposition would be useful for three 
separate reasons: a) explaining why a particular model 
is sloppy overall, b) suggesting routes for model reduc- 
tion and coarse graining by subsuming degrees of freedom 
within Vandermonde systems, and c) prescribing changes 
in parameters to alter specific aspects of model behavior. 

Complex models from a wide array of scientific fields 
are sloppy: they each have an exponentially large range of 
sensitivities to changes in underlying parameter values. 
This occurs because the parameters natural for experi- 
mental manipulation or human description are often a 
severe distortion of the basis natural for describing sys- 
tem behavior. Far from being a deficiency, sloppiness is 
in fact a saving grace of complex models — provided the 
right combinations of parameters are known they pro- 
vide nontrivial and well-constrained predictions despite 
surprisingly unconstrained parameters overall. Under- 
standing the origins and implications of sloppiness in its 
various incarnations offers new, fundamental insights into 
complex systems. 
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